Contextual Bandits with Latent Confounders: An NMF Approach

نویسندگان

  • Rajat Sen
  • Karthikeyan Shanmugam
  • Murat Kocaoglu
  • Alexandros G. Dimakis
  • Sanjay Shakkottai
چکیده

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are L observed contexts and K arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality m (m ⌧ L, K). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the L⇥K mean reward matrix U (for each context in [L] and each arm in [K]) factorizes into non-negative factors A (L⇥m) and W (m ⇥ K). This insight enables us to propose an ✏-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of O (Lpoly(m, log K) log T ) at time T , as compared to O(LK log T ) for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of O (Km log T ). These are the first regret Proceedings of the 20 International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, Florida, USA. JMLR: W&CP volume 54. Copyright 2017 by the author(s). guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Contextual Bandits: A Non-Negative Matrix Factorization Approach

We consider the stochastic contextual bandit problem with a large number of observed contexts and arms, but with a latent low-dimensional structure across contexts. This low dimensional (latent) structure encodes the fact that both the observed contexts and the mean rewards from the arms are convex mixtures of a small number of underlying latent contexts. At each time, we are presented with an ...

متن کامل

Latent Contextual Bandits and their Application to Personalized Recommendations for New Users

Personalized recommendations for new users, also known as the cold-start problem, can be formulated as a contextual bandit problem. Existing contextual bandit algorithms generally rely on features alone to capture user variability. Such methods are inefficient in learning new users’ interests. In this paper we propose Latent Contextual Bandits. We consider both the benefit of leveraging a set o...

متن کامل

A Survey on Contextual Multi-armed Bandits

4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...

متن کامل

Resourceful Contextual Bandits

We study contextual bandits with ancillary constraints on resources, which are common in realworld applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and ...

متن کامل

PAC-Bayesian Analysis of Contextual Bandits

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017